Papers and Reading :

-  Mikolov et al Word2vec
-  1402.3722v1 arxiv
-  stack overflow why does word2vec use 2 representations for each word
-  https://arxiv.org/pdf/1310.4546.pdf
-  Paragraph2Vec Mikolov ICML 2014
-  MRNet-Product2Vec ECML-PKDD 2017.
-  Crowd Sourcing Howe book.
-  Raykar et al JMLR 2010 EM Algo
-  Missing labels Raykar et al JMR 2010
-  Modeling task complexity Welinder et al 2010 and whitehill et al NIPS 2010. 
-  Sequential Crowdsourced labeling as MDP raykar et al
-  Deep structured semantic embedding, Microsoft Research 
-  Visw2v CVPR 2016
-  1602.05568 arxiv
-  1801.03244, ICLR 2018
-  iWGAN Guljarani et al 
-  EcommerceGAN
-  


T-SNE : To get data from high dimension to low dimension for visualizing.

For crowdsourced labeling : CrowdFlower, AmazonMechanicalTurk

Lecture 1 - Representation Learning

  • The layer just before classification/regression layer is a good example of a learned representation.
  • A representation is basically a feature.
  • These learned representations may work well in other kind of tasks too.

Word2Vec

  • Earlier words were expressed through a sparse vector , if there were 10000 words and dog was the 100th word. then the 100th value was 1 while all others were zero. This one hot encoding although worked very well, this couldn't work well for a complex task because it didnt capture semantic embeddings.
  • Word2Vec : A word is known by the company it keeps.
  • How do you learn these embedding? Based on the neighbors. By taking a huge text corupus , we basically see which set of words are co occurrring with each other. Done through NN, Shallow ones albeit.
  • Done with continous bag of words or skip gram.
  • CBOW Arch : Full connected NN, Each node is inputted a one hot encoding. (CxV,N,V) Dimension. C = Window Size.
  • Skip gram Arch : The reverse Arch of CBOW ARCH.
  • Skip gram works with small amount of training data, while cbow works several times faster.
  • gensim is a popular way to learn word2vec.
  • There's an inherent challenge for word2vec, if you have 10000 vocabulary size and your embedding is 300 dimensions then you have 3 million weights. Can use negative sampling to overcome this.

  • Limitations :

    1. Sometimes vulnerable and not a robust concept.
    2. Takes a long time to train.
    3. Hard to understand and visualize.

Glove :

  • Instead of using co occurence probabilities, we use ratio of co-occurrence probability.

Paragraph2vec:

  • Using Distributed Memory Model(similar to cbow) and distributed bag of words model(similar to skip gram).

Lecture 2 - Learning From crowds

Crowdsourced Data Annotation

  • Getting objective labels[the gold standard] is expensive, tedious, invasive and potentially dangerous.
  • What we actually get is subjective labels[approx gold standard] from experts.
  • But even that is expensive, therefore crowdsourcing is the way to go.
  • The problem with crowdsourcing is how to consolidate these multi-annotated data.
  • One way is to do majority voting, the problems with majority voting is that what if only one of them is a good labler and the rest are novices.
  • So you need to have annotator models, with sensitivity and specificity in consideration.
  • Use EM algorithm , the initialization is majority voting.
  • A few extensions:
    1. Bayesian Approaches
    2. Variation Bayes
    3. Modeling task Complexity
    4. Missing Labels
    5. Categorical, ordinal, and continuous annotations.

Learning from crowds

  • Proposed EM algo using classifer while doing m-step in EM. Soft logistic Regression in the M step

Accuracy, cost, time

  • Reducing costs includes pay based on performance and removing evil annotators.
  • Spammer Metric

Sequential Crowdsourced labeling

  • as a Markov decision process , by incorporating cost.

Open Problems :

  • Incentive Design.

Lecture 3

Deep Structured Semantic Embedding (DSSM)

  • Architecture : There's a term vector and then word hashing which are put through multi-laye non linear projections which give out the semantic features, these semantic measures relevance is measured then by a cosine similarity ultimately going through a softmax layer to give an output.
  • You want the query being searched for and the document which is clicked to have close embeddings to each other.
  • Word hashing is a way to show a word in 3 character trigrams.

Visw2v

  • Word2vec for vision.
  • Training data : scenes and associated text

Med2Vec

  • Distributed representatiosn for both visits and medical codes

Product 2 Vec

  • All ecommerce companies typically process billions of orders.
  • A semantic representations and understanding of these orders can help recommendations, order similarity and forecasting.
  • A Naive representation would include tf-idf representation.
  • Trained a discriminative multi-task neural network.
  • Different signals pertaiing to a product were explicitly injected. Static signals like color, material, weight etc and dynamic signals like price, popularity views. The goal was to learn a general representation.
  • This is referred to as a Multi-Task Recurrent Neural Network(MRNet), the tasks are to predict, color, material, price etc.
  • Alternative Optimization is used it means that the gradient is computed according to the task, this task is selected randomly.
  • These embeddings can be used in problems like electrical plugs classfication, Ingestible classfication etc.
  • Future Work :-
    1. Incorporating more signals
    2. Product image Simulation

EcommerceGAN

Lecture 4 - AI for Fashion

  • 4 Key Aspects - Search, Recommend, Discover, Engage.
  • Fashion Taxonomy
  • Color Taxonomy
  • Research Problems - Taxonomy Curation, Taxonomy expansion, Embedding, New Fashion Themes and concepts, Siamese Style Architectures, Learning pose and background invariant representations, Aspect Based Visual search, Multi-scale attention based models for object detection, Create Outfit, Multi Modal Dialog, Normalizing sizes across brands, Color trend Analysis
  • Theme Capsules Eg: Athleisure, Shacket and skort
  • Thematic Quering
  • Visual Fashion Tags
  • Dominant Color
  • h&m sells about 4.3billion dollar worth clothes since they dont have a second hand store models.
  • Cognitive Couture,
  • WGSM fashion
  • Color trend analysis - Popular color and trending color
  • Cognitive Collection - Jason Grech + IBM Watson
  • Cognitive Prints - Can Computers be creative ? Falguni Shane example.
  • WWD + IBM fashion article

In [ ]: